ANEC: An Amharic Named Entity Corpus and Transformer Based Recognizer
نویسندگان
چکیده
Named Entity Recognition is an information extraction task that serves as a pre-processing step for other natural language processing tasks, such machine translation, retrieval, and question answering. entity recognition enables the identification of proper names well temporal numeric expressions in open domain text. For Semitic languages Arabic, Amharic, Hebrew, named more challenging due to heavily inflected structure these languages. In this study, we annotate new comparatively large Amharic dataset make it publicly available. Using dataset, build multiple systems based on recent deep learning approaches including transfer (RoBERTa), bidirectional long short-term memory coupled with conditional random fields layer. By applying Synthetic Minority Over-sampling Technique mitigate imbalanced classification problem, our best performing RoBERTa system achieves f1-score 93%, which state-of-the-art result recognition.
منابع مشابه
Czech Named Entity Corpus and SVM-based Recognizer
This paper deals with recognition of named entities in Czech texts. We present a recently released corpus of Czech sentences with manually annotated named entities, in which a rich two-level classification scheme was used. There are around 6000 sentences in the corpus with roughly 33000 marked named entity instances. We use the data for training and evaluating a named entity recognizer based on...
متن کاملCheNER: chemical named entity recognizer
MOTIVATION Chemical named entity recognition is used to automatically identify mentions to chemical compounds in text and is the basis for more elaborate information extraction. However, only a small number of applications are freely available to identify such mentions. Particularly challenging and useful is the identification of International Union of Pure and Applied Chemistry (IUPAC) chemica...
متن کاملA Biological Named Entity Recognizer
In this paper we describe a new named entity extraction system. Our system is based on a manually developed set of rules that rely heavily upon some crucial lexical information, linguistic constraints of English, and contextual information. This system achieves state of art results in the protein name detection task, which is what many of the current name extraction systems do. We discuss the n...
متن کاملStatistical Named Entity Recognizer Adaptation
Named entity recognition (NER) is a subtask of widely-recognized utility of information extraction (IE). NER has been explored in depth to provide rapid characterization of newswire data (Sundheim, 1995; Palmer and Day, 1997). The NER task involves both identification of spans of text referring to named entities, and categorization of these entities into classes based on the role they fill in c...
متن کاملNEROC: Named Entity Recognizer of Chemicals
We describe a pipeline system, Named Entity Recognizer of Chemicals (NEROC), that aims to identify chemical entities mentioned in free texts. The system is based on a machine learning approach, a Conditional Random Field (CRF), and a selection of feature sets that are used to capture specific characteristics of chemical named entities. In this paper, we report results that produced by CRF model...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3243468